China's Stock Market: A Story of Regulation, Leverage, and Premium

Data Bootcamp UG Fall 2016

Written by Xingyan Li, NYU Stern Class of 2017

Data Background

China’s capital system is undergoing rapid development, and its equity market size measured by total market capitalization has been No.2 since 2009. As of May 2015, the market capitalization of domestic issues on the booming Shanghai (SSE) and Shenzhen (SZE) exceeded $10 trillion and surpassed 14 trillion if including Hong Kong (HKEX), highlighting the extraordinary spread of market-based finance in a country led for more than 65 years by its communist party. However, China experienced a period of stock market turbulence in the summer of 2015 worsened by economic weakness, financial panic, and the policy response to these problems.

Mainland China’s equity markets differ from its Hong Kong counterpart due to capital controls under the communist party. Despite the recent launch of Shanghai-Hong Kong Stock Connect that lowers the cost of cross-border transactions, investors in each market still have limited access to shares listed on the other market due to investor eligibility, stock selections, and investment quotas.

Cross market indices such as the Hang Seng China AH Premium Index and Hang Seng China 50 Index are hybrid inventions under China's current political and capital structure, capturing investment opportunities created by exposure to a comprehensive China investment universe (Mainland-listed A and B shares, Hong Kong-listed H shares, Red Chips and shares of other Mainland companies).

My project is primarily focused on Hang Seng China AH Premium Index, which tracks the average price difference of A shares over H shares for the largest and most liquid Chinese companies with both A-share and H-share listings (“AH Companies”). This index shows a common premium of A share prices relative to shares of the same firms that trade in Hong Kong (H shares), and it is worth investigating the price discrepancy especially under the stress test of China's market crash in summer 2015.

Data Dictionary

"A" Shares: shares of Chinese companies listed in Mainland China, traded in local currency Chinese Yuan

"H" Shares: shares of Chinese companies listed in Hong Kong, traded in Hong Kong dollars

"B" shares: available to foreigners listed in U.S. dollars in Shanghai and Hong Kong dollars in Shenzhen, but fewer and fewer investors are following this market and it may phase out

Abstract

I will first analyze market trends for both Shanghai Composite and HSI (Hang Seng Index) to examine the correlation between two markets determined by different market schemes. Unlike HSI, the Shanghai Composite is still not entirely open to foreign investors due to tight capital account controls enforced by the Mainland authorities. In other worlds, Shanghai Composite is primarily a market index dominated by domestic investors. Then, I will pick companies from Hang Seng China AH Premium Index and graph both their A-Share and H-Share performances to discuss the premium before and after market crash 2015.

Import Packages



In [1]:

    
import sys                             # system module
import pandas as pd                    # data package
import matplotlib as mpl               # graphics package
import matplotlib.pyplot as plt        # pyplot module
import numpy as np                     # foundation for Pandas 
import datetime as dt 
import html5lib

# plotly imports
from plotly.offline import iplot, iplot_mpl  # plotting functions
import plotly.graph_objs as go               # ditto
import plotly                                # just to print version and init notebook
import cufflinks as cf                       # gives us df.iplot that feels like df.plot
cf.set_config_file(offline=True, offline_show_link=False)

# check versions
print('Python version:', sys.version)
print('Pandas version: ', pd.__version__)
print('Plotly version: ', plotly.__version__)
print('Today: ', dt.date.today())

# Puts plots in notebook 
%matplotlib inline









    











    











    



Python version: 3.5.2 |Anaconda 4.2.0 (x86_64)| (default, Jul  2 2016, 17:52:12) 
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)]
Pandas version:  0.19.0
Plotly version:  1.12.11
Today:  2016-12-22

Creating Datasets: HSI vs. Shanghai Composite in 2015



In [2]:

    
# Read data from Yahoo Finance through DataReader
from pandas_datareader.data import DataReader
from datetime import datetime

# Track the performance of HSI and Shanghai Composite in year of 2015 
start = dt.datetime(2015, 1, 1)  
end = dt.datetime(2015, 12, 31)

# Put shcomp and hsi data into dataframes for further analysis. 
shcomp = DataReader('000001.SS',  'yahoo', start, end) 
shcomp = shcomp['Close']
shcomp = pd.DataFrame(shcomp)

hsi = DataReader('^HSI',  'yahoo', start, end)
hsi = hsi['Close']
hsi = pd.DataFrame(hsi)

# Pick a specific mutual listing company to discuss the premium of its A share relative to its H share counterpart
PingAn_H = DataReader('2318.HK',  'yahoo', start, end) 
PingAn_H = PingAn_H['Close']
PingAn_H = pd.DataFrame(PingAn_H)

PingAn_A = DataReader('601318.SS',  'yahoo', start, end)
PingAn_A = PingAn_A['Close']
PingAn_A = pd.DataFrame(PingAn_A)



In [3]:

    
# Reset index for later proper merging of datasets
shcomp = shcomp.reset_index()
hsi = hsi.reset_index()
PingAn_H = PingAn_H.reset_index()
PingAn_A = PingAn_A.reset_index()



In [4]:

    
combo = pd.merge(hsi, shcomp,     # left and right df's
                 how='inner',     # add to left 
                 on='Date'        # link with this variable/column 
                ) 
combo = combo[['Date','Close_x','Close_y']]
combo.columns = [['Date','HSI','SHCOMP']]
combo.plot(kind ='line',figsize = (8,6), x = 'Date', subplots = True, title = 'Shanghai Composite VS Hang Seng Index')









    Out[4]:





array([<matplotlib.axes._subplots.AxesSubplot object at 0x11ab5ec50>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x119f9bd30>], dtype=object)



In [5]:

    
# Merge the datasets based on 'Date' using inner method. I want to use intersection of trading days from both frames
combo1 = pd.merge(PingAn_H, PingAn_A,   # left and right df's
                 how='inner',           # intersection of trading days from both Shanghai and Hong Kong 
                 on='Date'              # link with this variable/column 
                ) 
combo1.columns=[['Date','PingAn_H','PingAn_A']]



In [6]:

    
# Divide PingAn's A share performance by its H share performance to create a new column of AH Index
combo1['AH Index']=combo1['PingAn_A']/combo1['PingAn_H']
combo1.plot(subplots=True,figsize = (8,6), x='Date')









    Out[6]:





array([<matplotlib.axes._subplots.AxesSubplot object at 0x11aae9908>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x107298518>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x114fb2828>], dtype=object)

Data Analysis: Daily Returns and General Trend

During China's stock crash in summer 2015, Ping An's A Share listing experienced a more drastic loss than its H Share listing, upsetting a continuous AH Index premium.
Through statistical analysis below, we can observe that Ping An's A share listing has a higher standard deviation than its H share listing; therefore, 2015 summer market crash serves as a stress test to examine the real valuations of firms listed on China's Mainland exchanges.
China's equity market had rallied since March 2015 before the market crashed in July; during this period AH Index also skyrocketed for dual listing companies as their Mainland shares had been trading stronger than Hong Kong listings. In fact, this premium vanished following the stock crash, suggesting a fundamental misevaluation of Mainland's public companies.



In [7]:

    
# Statistical analysis to display standard deviation of Ping An's A share and H share performance
combo1.describe()

Data Analysis: Monthly Returns and Volatility

Resampling the data into monthly returns so we have better distribution analysis of dual listing's return to discuss implied volatility. The companies chosen here are Ping An and Tsingtao Brew, and I am using a 3-year period for this regression



In [8]:

    
start = dt.datetime(2013, 1, 1)  
end = dt.datetime(2015, 12, 31)    
tickers = ['2318.HK','601318.SS']   
pingan2 = DataReader(tickers, 'yahoo', start, end)
pingan2 = pingan2.to_frame().unstack()['Close']  # unstack the data and choose the "Close" column
pingan2 = pingan2.resample('MS')                 # adopt month start frequency to better show monthly return/volatility                   

pingan2pct = pingan2.pct_change().shift(-1)      # move pct_change result one unit up
pingan2pct = pingan2pct.round(4)*100             # multiply to make it 100% unit  
pingan2pct.plot(kind='line', title='Monthly Return') 
pingan2pct.head(3)

# Apply histogram to show the distribution of Pingan's dual listing returns
pingan2pct.plot(kind='hist',subplots=True, bins = 30,title="histograph for 2318.HK and 601318.SS")









    



/Users/sglyon/anaconda3/lib/python3.5/site-packages/ipykernel/__main__.py:8: FutureWarning:


.resample() is now a deferred operation
You called pct_change(...) on this deferred object which materialized it into a dataframe
by implicitly taking the mean.  Use .resample(...).mean() instead







    Out[8]:





array([<matplotlib.axes._subplots.AxesSubplot object at 0x11bc62ef0>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x11bcd77b8>], dtype=object)

we can see above that PingAn's A share (601318.SS) had a wider distribution of returns than its H share listing (-40% and 40%), suggesting higher volatility



In [9]:

    
tickers2 = ['0168.HK','600600.SS']   
tsingtao_brew  = DataReader(tickers2, 'yahoo', start, end)
tsingtao_brew = tsingtao_brew.to_frame().unstack()['Close']  # unstack the data and choose the "Close" column
tsingtao_brew = tsingtao_brew.resample('MS')                 # adopt month start frequency to better show monthly return/volatility                   

tsingtao_brewpct = tsingtao_brew.pct_change().shift(-1)      # move pct_change result one unit up
tsingtao_brewpct = tsingtao_brewpct.round(4)*100             # multiply to make it 100% unit  
tsingtao_brewpct.plot(kind='line', title='Monthly Return') 
tsingtao_brewpct.head(3)

# Apply histogram to show the distribution of vanke's dual listing returns
tsingtao_brewpct.plot(kind='hist',subplots=True, bins = 30,title="histograph for 0168.HK and 600600.SS")









    



/Users/sglyon/anaconda3/lib/python3.5/site-packages/ipykernel/__main__.py:6: FutureWarning:


.resample() is now a deferred operation
You called pct_change(...) on this deferred object which materialized it into a dataframe
by implicitly taking the mean.  Use .resample(...).mean() instead







    Out[9]:





array([<matplotlib.axes._subplots.AxesSubplot object at 0x11c354940>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x11c3b3b00>], dtype=object)

The distribution of Tsingtao Brew's dual listing returns again shows A share (600600.SS)'s greater volatility than H share counterpart.

Creating Datasets: HSI AH Premium Index

We can find HSI's official data on AH Premium index here and this is an online pdf file. We can find relevant information regarding this index, such as launch date and constituents.
I tried several packages to read data in pdf, but they didn't work. So I manually collected data from HSI's report and uploaded in my personal github's public domain as a CSV file to access for future analysis.
The dataset below shows dual listing's industry classification and A/H price ratio and I am interested in exploring which sector tends to trade a higher ratio by grouping the data and computing the average ratio for each sector

data updated and published by HSI in Oct. 2016



In [10]:

    
# Create the path of url/csv to access
path="https://raw.githubusercontent.com/simonxingyanli/kidscoding/master/hsi_ahpremium.csv"
hsi_ahpremium=pd.read_csv(path)
hsi_ahpremium.head(5)









    Out[10]:






  
    
      
      Company Name
      Industry Classification
      A/H Price Ratio (%)
    
  
  
    
      0
      CSCL
      Industrials
      289.54
    
    
      1
      Sinopec SSC
      Energy
      279.62
    
    
      2
      GAC Group
      Consumer Goods
      273.07
    
    
      3
      COMEC
      Industrials
      269.86
    
    
      4
      SH Electric
      Industrials
      268.67

We have a dataframe of all listings within AH Premium Index



In [11]:

    
# Calculate the average A/H Price Ratio for constituents included in the index
hsi_ahpremium['A/H Price Ratio (%)'].mean()









    Out[11]:





158.2509523809524



In [12]:

    
# Group the data based on industry classification and count the number of companies in each sector
industry_class = hsi_ahpremium[['Industry Classification','A/H Price Ratio (%)']].groupby('Industry Classification')
industry_class.count()









    Out[12]:






  
    
      
      A/H Price Ratio (%)
    
    
      Industry Classification
      
    
  
  
    
      Consumer Goods
      8
    
    
      Consumer Services
      4
    
    
      Energy
      7
    
    
      Financials
      17
    
    
      Industrials
      10
    
    
      Information Technology
      1
    
    
      Materials
      6
    
    
      Properties & Construction
      7
    
    
      Utilities
      3

Above chart shows the number of companies categorized in each sector, and this index weighs heavily on traditional sectors like Financials and Industrials and lightly on emmerging industries like Consumer Services and Information Technology.



In [13]:

    
# Group the data again based on industry for future statistical analysis.
counts = hsi_ahpremium.groupby(['Industry Classification','A/H Price Ratio (%)','Company Name']).count()
counts.head()









    Out[13]:






  
    
      
      
      
    
    
      Industry Classification
      A/H Price Ratio (%)
      Company Name
    
  
  
    
      Consumer Goods
      90.12
      Fuyao Glass
    
    
      113.46
      Sh Pharma
    
    
      114.17
      Fosun Pharma
    
    
      115.07
      Tsingtao Brew
    
    
      126.14
      BYD Company



In [14]:

    
# Summmarize the sum, mean, std, and number of companies for each sector
hsi_summary = industry_class['A/H Price Ratio (%)'].agg([np.sum, np.mean, np.std, len])
hsi_summary









    Out[14]:






  
    
      
      sum
      mean
      std
      len
    
    
      Industry Classification
      
      
      
      
    
  
  
    
      Consumer Goods
      1146.38
      143.297500
      57.195378
      8.0
    
    
      Consumer Services
      680.05
      170.012500
      38.539425
      4.0
    
    
      Energy
      1270.11
      181.444286
      65.103597
      7.0
    
    
      Financials
      2030.86
      119.462353
      14.183785
      17.0
    
    
      Industrials
      2007.02
      200.702000
      61.709290
      10.0
    
    
      Information Technology
      164.11
      164.110000
      NaN
      1.0
    
    
      Materials
      1051.50
      175.250000
      43.152325
      6.0
    
    
      Properties & Construction
      1060.88
      151.554286
      35.986654
      7.0
    
    
      Utilities
      558.90
      186.300000
      24.727032
      3.0

On average, Industrials have highest premium followed by Utilities, Materials, and Energy sectors, which are mostly state owned and traditional industries. In terms of standard deviation which measures volatility of the index, Industrials ranks the first again.



In [15]:

    
# Creating a pie chart for industry classification and a bar chart for mean of each industry
fig, ax = plt.subplots(2, 1)

hsi_summary.plot.pie(ax=ax[0],
                     figsize=(4,8), 
                     y='len', 
                     legend = False,
                     autopct='%1.0f%%')

hsi_summary.plot.barh(ax=ax[1], 
                      y="mean", 
                      color = ['blue','orange'], 
                      legend = False)
ax[0].set_title("Components of HSI AH Premium Index", fontsize = 10)
ax[1].set_ylabel("Mean AH Premium Ratio", fontsize = 10)









    Out[15]:





<matplotlib.text.Text at 0x11c8ba438>

The bar chart above displays AH Premium Index's distribution, 27% of listings being Financials and 16% being Industrials while emerging industries like Information Technology only accounting for 2%. China's economic growth is primarily correlated with the performance of SOEs in receipt of government subsidy and special policies.

Conclusions: AH Premium Index & China's Stock Market

China's equity market is not as developed as mature markets such as the United States due to lack of derivative products and state intervention on short-selling activities. 80% of the trading population in China are retail investors who are often highly levered and uneducated in stock markets. Afraid of upsetting political and social stability, China Securities Regulatory Commission (CSRC) refuse to de-list public companies that failed to perform for three consecutive quarters. During the stock crash in 2015, it induced bad investing and allowed reckless investments into underperforming companies that were essentially worthless on the books.
HSI's AH Premium Index is a great tool to assess the valuation of domestic public companies against its listing in HSI, which allows various futures and options and is a free market open to foreigh hedge funds. Earlier comparison between HSI and Shanghai Composite in 2015 shows two markets possess similar fundamentals as they move together despite Shcomp's higher standard deviation. What causes the premium of standard deviation or return?
After analyzing AH Premium Index, I learnt that on average, Industrials have highest premium and standard deviation followed by Utilities, Materials, and Energy sectors, which are mostly state owned and traditional industries. Is it suggesting Chinese SOEs (state-owned enterprises) are overvalued in free market? Prior to the stock crash, state-owned media described a bull market that did not match China's economic growth measured by GDP and PMI, which had been declining, despite an irrational stock market.
During the bull market from March 2015 to June 2015, PingAn's AH premium (calculated earlier dividing A share by H share) surged and dropped to previous level after the stock crash, which is applicable for other companies included in the AH Index. Although the HSI AH Premium Index includes primarily traditional SOEs with limited exposure to China's emerging industries, it serves as a unique scorecard of China's economic growth by analyzing the premium of domestic listings over HK listings. The new question is: whether this new normal of stability after the stock crash is sustainable in the long run especially when China decides to open itself to foreign investors and allows more advanced derivatives.

Sources and References

Official HSI data on AH Premium Index

Seeking Alpha report on trading the index

Research report on China's stock market boom and bust

	PingAn_H	PingAn_A	AH Index
count	261.000000	261.000000	261.000000
mean	55.083142	58.501379	1.067574
std	16.707415	23.606698	0.378483
min	35.550000	25.110000	0.687945
25%	43.400000	34.070000	0.787991
50%	47.675000	68.490000	0.832596
75%	57.450000	80.400000	1.527184
max	93.450000	93.170000	1.884556

	Company Name	Industry Classification	A/H Price Ratio (%)
0	CSCL	Industrials	289.54
1	Sinopec SSC	Energy	279.62
2	GAC Group	Consumer Goods	273.07
3	COMEC	Industrials	269.86
4	SH Electric	Industrials	268.67

	A/H Price Ratio (%)
Industry Classification
Consumer Goods	8
Consumer Services	4
Energy	7
Financials	17
Industrials	10
Information Technology	1
Materials	6
Properties & Construction	7
Utilities	3


Industry Classification	A/H Price Ratio (%)	Company Name
Consumer Goods	90.12	Fuyao Glass
	113.46	Sh Pharma
	114.17	Fosun Pharma
	115.07	Tsingtao Brew
	126.14	BYD Company

	sum	mean	std	len
Industry Classification
Consumer Goods	1146.38	143.297500	57.195378	8.0
Consumer Services	680.05	170.012500	38.539425	4.0
Energy	1270.11	181.444286	65.103597	7.0
Financials	2030.86	119.462353	14.183785	17.0
Industrials	2007.02	200.702000	61.709290	10.0
Information Technology	164.11	164.110000	NaN	1.0
Materials	1051.50	175.250000	43.152325	6.0
Properties & Construction	1060.88	151.554286	35.986654	7.0
Utilities	558.90	186.300000	24.727032	3.0